37 research outputs found
Real time clustering of time series using triangular potentials
Motivated by the problem of computing investment portfolio weightings we
investigate various methods of clustering as alternatives to traditional
mean-variance approaches. Such methods can have significant benefits from a
practical point of view since they remove the need to invert a sample
covariance matrix, which can suffer from estimation error and will almost
certainly be non-stationary. The general idea is to find groups of assets which
share similar return characteristics over time and treat each group as a single
composite asset. We then apply inverse volatility weightings to these new
composite assets. In the course of our investigation we devise a method of
clustering based on triangular potentials and we present associated theoretical
results as well as various examples based on synthetic data.Comment: AIFU1
An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit
We study the problem of information sharing and cooperation in Multi-Player
Multi-Armed bandits. We propose the first algorithm that achieves logarithmic
regret for this problem. Our results are based on two innovations. First, we
show that a simple modification to a successive elimination strategy can be
used to allow the players to estimate their suboptimality gaps, up to constant
factors, in the absence of collisions. Second, we leverage the first result to
design a communication protocol that successfully uses the small reward of
collisions to coordinate among players, while preserving meaningful
instance-dependent logarithmic regret guarantees.Comment: 44 page
Robustness Guarantees for Mode Estimation with an Application to Bandits
Mode estimation is a classical problem in statistics with a wide range of
applications in machine learning. Despite this, there is little understanding
in its robustness properties under possibly adversarial data contamination. In
this paper, we give precise robustness guarantees as well as privacy guarantees
under simple randomization. We then introduce a theory for multi-armed bandits
where the values are the modes of the reward distributions instead of the mean.
We prove regret guarantees for the problems of top arm identification, top
m-arms identification, contextual modal bandits, and infinite continuous arms
top arm recovery. We show in simulations that our algorithms are robust to
perturbation of the arms by adversarial noise sequences, thus rendering modal
bandits an attractive choice in situations where the rewards may have outliers
or adversarial corruptions.Comment: 12 pages, 7 figures, 14 appendix page
Anytime Model Selection in Linear Bandits
Model selection in the context of bandit optimization is a challenging
problem, as it requires balancing exploration and exploitation not only for
action selection, but also for model selection. One natural approach is to rely
on online learning algorithms that treat different models as experts. Existing
methods, however, scale poorly () with the number of models
in terms of their regret. Our key insight is that, for model selection in
linear bandits, we can emulate full-information feedback to the online learner
with a favorable bias-variance trade-off. This allows us to develop ALEXP,
which has an exponentially improved () dependence on for its
regret. ALEXP has anytime guarantees on its regret, and neither requires
knowledge of the horizon , nor relies on an initial purely exploratory
stage. Our approach utilizes a novel time-uniform analysis of the Lasso,
establishing a new connection between online learning and high-dimensional
statistics.Comment: 37 pages, 7 figure